Andrew Zammit Mangion
09 October 2014
Environmental systems are characterised by phenomena which evolve both in space and in time (cloud modelling, weather and climate, glacial flow, spread of invasive species etc.).
The aim of spatio-temporal modelling is to
\( \def\bold#1{\mathbf #1} \) \( \def\yvec{\bold{y}} \) \( \def\svec{\bold{s}} \) \( \def\vvec{\bold{v}} \) \( \def\wvec{\bold{w}} \) \( \def\gammab{\bold{\gamma}} \) \( \def\thetab{\bold{\theta}} \) \( \def\kvec{\bold{k}} \) \( \def\phib{\bold{\phi}} \) \( \def\psib{\bold{\psi}} \) \( \def\evec{\bold{e}} \) \( \def\xvec{\bold{x}} \) \( \def\Qmat{\bold{Q}} \) \( \def\Amat{\bold{A}} \) \( \def\betab{\bold{\beta}} \) \( \def\Lmat{\bold{L}} \) \( \def\Thetab{\bold{\Theta}} \) \( \def\Jmat{\bold{J}} \) \( \def\Cmat{\bold{C}} \) \( \def\Tmat{\bold{T}} \)
Divide an conquer approach to a complext problem.
\[ \yvec_t = g(X_t(\svec), \vvec_t; \thetab_h) \]
\[ X_t(\svec) = f(\svec,t,\wvec_t(\svec); \thetab_m) \]
\[ p(\thetab_m ; \gammab_m), p(\thetab_h; \gammab_h), p(\thetab_v ; \gammab_v), p(\thetab_w; \gammab_w) \]
Two approaches:
\[ X_t(\svec) \sim GP(\mu_t(\svec),K(t,t',\svec,\svec')) \]
General-purpose model, used when the dynamic structure is largely unknown.
\[ X_t(\svec) = f(X_{t-1}(\svec), X_{t-2}(\svec), \dots, \wvec_t(\svec); \thetab_m) \]
\[ \frac{\partial X(s;t)}{\partial t} - \beta \frac{\partial^2X(s;t)}{\partial s^2} + a X(s;t) = e(s;t) \]
where \( \alpha >0, \beta > 0 \) and \( e(s;t) \) is a random, zero mean error process. Then the geo-statistical approach requires use of covariance function:
\[ K(u;v) = \frac{K(0;0)}{2}\left(e^{-u^{\left( \frac{\alpha}{\beta}\right)^\frac{1}{2}}} Erfc\left(\frac{2v\left( \frac{\alpha}{\beta}\right)^\frac{1}{2} - \frac{u}{\beta}}{2\left( \frac{v}{\beta}\right)^\frac{1}{2}} \right) + \\ e^{u^{\left( \frac{\alpha}{\beta}\right)^\frac{1}{2}}} Erfc\left(\frac{2v\left( \frac{\alpha}{\beta}\right)^\frac{1}{2} + \frac{u}{\beta}}{2\left( \frac{v}{\beta}\right)^\frac{1}{2}}\right) \right) \]
Continuous-time ST processes are computationally tedious to deal with.
Focus on models of the form
\[ X_t(\svec) = f(X_{t-1}(\svec)) + e_t(\svec) \]
where \( ~e_t(\svec)~ \) is the solution to
\[ (\kappa^2 - \Delta)^{\alpha/2}(\tau e(\svec)) = \mathcal{W(\svec)} \]
where \( \mathcal{W(\svec)} \) is a space-time (white) Wiener process.
The SPDE
\[ (\kappa^2 - \Delta)^{\alpha/2}(\tau e(\svec)) = \mathcal{W(\svec)} \]
has as solution (Whittle, 1954)
\[ e(\svec) \sim \mathcal{GP}(0,K(\svec,\svec'; \alpha,\tau,\kappa)) \]
where \( k(\cdot,\cdot) \) is a Matérn covariance function. The wave spectrum of \( e(\svec) \) is given by
\[ R(\kvec) = (2\pi)^{-\alpha}(\kappa^2 + \| \kvec \|^2)^{-\alpha} \]
Non-stationarity can be handled by making the SPDE spatially varying. This results in a covariance function which is not a Matern field, but do we care?
We can model SPDEs on manifolds, not restricted to \( \mathbb{R}^2 \).
SPDEs can be approximated to arbitrary precision using an appropriate basis (Lindgren, 2011).
\[ X_{t+1}(\svec) = \mathcal{A}X_t(\svec) + e_t{\svec} \]
\[ \langle \phib,X_{t+1}(\svec) \rangle = \langle \phib,\mathcal{A}X_{t+1}(\svec) \rangle + \langle \phib,e_t(\svec)\rangle \]
\[ \langle \phib,\psib^T \rangle\xvec_{t+1} = \langle \phib,\mathcal{A}\psib^T \rangle\xvec_t + \langle \phib,\psib^T\rangle\evec_t \]
For \( \alpha = 2 \) and \( \psib = \phib \) (Galerkin method):
\[ [ \langle\phib, \kappa^2\phib^T\rangle - \langle\phib, \Delta\phib^T\rangle)]\tau \evec = \langle \phib,\mathcal{W(\svec)} \rangle \]
\[ \Large \evec \sim \mathcal{N}(\bold{0},\Qmat^{-1}) \] where \( \Qmat \) is sparse.
(Lindgren, 2014)
Useful when we have multiple interacting spatio-temporal processes
Consider the bi-variate sub-system of random forcings: \( e_{1,t}(\svec), e_{2,t}(\svec) \). Define the multi-variate Gaussian process (MVGP) as
\[ (e_{1,t}(\svec), e_{2,t}(\svec)) = \bold{e}(\svec) \sim \mathcal{MVGP}(\bold{0},\bold{K}(\svec,\svec',t,t';\thetab_K)) \]
where \( \bold{K}(\svec,\svec',t,t';\thetab_K) \) is a \( 2 \times 2 \) cross-covariance matrix function (i.e.~ each element of \( \bold{K} \) is a covariance function (Finley, 2007):
\[ \bold{K}(\svec,\svec',t,t') = [Cov(e_{i,t}(\svec), e_{i,t'}(\svec'))]_{i,j \in \{1,2\}} \]
\[ \begin{bmatrix} \mathcal{A}_1 & & \\ & & \mathcal{A}_2 & \mathcal{A}_{2,3} \\ & & \mathcal{A}_{3,2} & \mathcal{A}_3 \end{bmatrix} \begin{bmatrix} e_1(\svec) \\ e_2(\svec) \\ e_3(\svec) \end{bmatrix} = \begin{bmatrix} \mathcal{W}_1(\svec) \\ \mathcal{W}_2(\svec) \\ \mathcal{W}_3(\svec) \end{bmatrix} \]
SPDEs are a useful way to modelling complex phenomena.
Real power comes in using finite elements to decompose and represent them as GMRFs
From now on we restrict ourselves to the linear, Gaussian case
\[ \begin{bmatrix} \xvec_{1,t} \\ \xvec_{2,t} \\ \vdots \end{bmatrix} = \begin{bmatrix} \Amat_{1} & \Amat_{1,2} & \dots \\ \Amat_{2,1}& \Amat_{2,2} & \dots \\ \vdots& \vdots & \ddots \\ \end{bmatrix} \begin{bmatrix} \xvec_{1,t-1} \\ \xvec_{2,t-1} \\ \vdots \end{bmatrix} + \bold{J}_t\bold{\beta} + \begin{bmatrix} \evec_{1,t} \\ \evec_{2,t} \\ \vdots \end{bmatrix} \]
Collapse all variates, and their spatial and temporal components into one big GMRF
Sparsity is retained and one can make use of fill-in reduction permutations when computing the Cholesky factor (Knorr-Rue and Held, 2002)
\[ p(\xvec,\betab_I | \Thetab) = \left(\prod_{i=1}^T{p(\xvec_{t} | \xvec_{t-1},\betab,\Thetab)}\right)p(\xvec_0 | \betab, \Thetab)p(\betab_I) \]
\[ \xvec_0|\betab_I,\Thetab \sim \mathcal{N}(\bold{J}_0\betab,[\Qmat_w - \Amat^T\Qmat_w\Amat]^{-1}) \]
\( (\xvec,\betab) \sim \mathcal{N}(\bold{0},\Qmat^{-1}) \) where
\[ \small \begin{array}{cc} \Qmat = \begin{bmatrix} \Qmat_1 & \Qmat_2 \\ \Qmat_2^T & \Qmat_3 \end{bmatrix} & \Qmat_2 = \begin{bmatrix} \Amat^T\Qmat_w\Jmat_{1} - \Qmat_0\Jmat_{0} \\ \Amat^T\Qmat_w\Jmat_{2} - \Qmat_w\Jmat_1 \\ \vdots\end{bmatrix}, \end{array} \]
\[ \small \Qmat_1 = \begin{bmatrix} \Amat^T\Qmat_w\Amat + \Qmat_0 & -\Amat^T\Qmat_w & ~~~ \\ -\Qmat_w\Amat & \Amat^T\Qmat_w\Amat + \Qmat_w& -\Amat^T\Qmat_w & ~~~ \\ & -\Qmat_w\Amat & \ddots & \ddots \\ & \ddots & \ddots & \ddots & \end{bmatrix} \]
\[ \small \Qmat_3 = \Jmat_0^T\Qmat_0\Jmat_0 + \sum_{i=1}^{T} (\Jmat_i^T\Qmat_w\Jmat_i) + \Qmat_\beta \]
Lemma 1: Takahashi Equations (Erisman and Tinney, 1975) Let \( \Lmat = chol(\Qmat) \). If \( L_{ij} \ne 0 \), then \( \Sigma_{ij} \) can be computed using elements in
- \( \Lmat \) and
- \( \Sigma^*_{IJ}, I = \{ i'; i' > i\}, J = \{j'; j' > j\} \)
Theorem 1: Let \( \yvec = \Cmat\xvec + \vvec \) and \( \vvec \sim \mathcal{N}(0,\Tmat^{-1}) \). Then if
- \( \textrm{zeros}(\Amat) \subseteq \textrm{zeros}(\Cmat) \)
- \( \Amat \) is non-negative
- \( \Tmat \) is diagonal, then \( diag(\Amat \Sigma^* \Amat^T) \equiv diag(\Amat \Sigma^*_{00} \Amat^T) \)
Lemma 2: If \( \Amat \) is non-negative then \( [\Amat^T\Amat]_{jk} = 0 \Leftrightarrow diag(\Amat\Sigma^*\Amat^T) \) does not depend on \( \Sigma^*_{jk} \)
\[ \begin{align} zeros(\Qmat^*) &= zeros(\Qmat + \Amat^T\Tmat^{-1}\Amat) \\ &\ge zeros(\Amat^T\Tmat^{-1}\Amat) = zeros(\Amat^T\Amat) \end{align} \]
\[ zeros(\Lmat^*)_{jk} \ge zeros(\Qmat^*)_{jk} \\ \ge zeros(\Amat^T\Amat) \]
\[ zeros(\Sigma_{00}) \ge zeros(\Lmat^*)_{jk} \]
Asub <- getC(e[[1]])[1:5000,]
system.time({ est <-diag( Asub %*% Results$Partial_Cov %*% t(Asub)) })
> user system elapsed
> 5.232 0.465 5.698
Lp <- Results$Qpermchol
P <- Results$P
system.time({X <- solve(Lp, t(P)%*%t(Asub))
est <- crossprod(X,X)} )
> user system elapsed
> 221.250 0.382 221.688
In practice the Takahashi equations are always used with GMRFs in order to obtain the marginal variances. Theorem 2 provides a means to obtain uncertainty on linear combinations essentially for free.
I can provide predictive uncertainties at any point provided that each row of \( zeros(\Amat) \subseteq zeros(\Cmat) \).
\[ x_i \perp \xvec_{-\{i,ne(i)\}} | \xvec_{ne(i)} ~~~ \forall i \in \mathcal{V} \]
\[ E(x_i | \xvec_{-i}) = -\frac{1}{Q_{ii}}\sum_{j:j\sim i}Q_{ij}x_j; ~~~ prec(x_i | \xvec_{-i}) = Q_{ii} \]
Theorem 3: Any planar map divided into contigious regions can be coloured using at most FOUR colours (Appel and Haken, 1976)
Theorem 3: Any planar map divided into contigious regions can be coloured using at most FOUR colours (Appel and Haken, 1976)
\[ \xvec_A \perp \xvec_B | \xvec_{C} \]
\( \forall A,B,C \notin \varnothing \) and \( C \) separating \( A,B \).
In typical Bayesian networks, finding \( A, B \) and \( C \) can be difficult (seed and spawn idea, Gonzales, 2011).
With a spatial network, this is remarkably easy using the Kernighan-Lin algorithm for graph partitioning (Kernighan and Lin, 1970).
Let \( G(V,E) \) be a graph, and let \( V \) be the set of nodes and \( E \) the set of edges.
Find a partition of two disjoint, equal-size subsets \( A,B \), such that the sum of the weights of the edges between nodes in A and B is minimized.
Setting edge weights = 1, gives a bisection algorithm!
Sampling from the prior guarantees enormous speed-ups using MPI
Observations with large spatial footprints cause the four-colour theorem to fail.
\[ \begin{align} \widehat\Qmat^{ii} \hat\xvec^i &= \Cmat^{i^T}\Qmat\left(\yvec - \sum_{j \in [1\dots d]/i}\Cmat^j\hat\xvec^j\right) - \\ &~~~~ \sum_{j \in [1\dots d]/i} \left(\Qmat^{ij}\hat\xvec^j \right) \end{align} \]
No two adjacent nodes in a super-graph can have the same colour and no two nodes of the same colour can be spanned by the same observation
Multi-variate spatio-temporal statistics has a lot to offer in a large range of environmental problems which remain untapped.
Computational properties of GMRFs allow us to consider major sources of uncertainty with influence on experimental design: e.g. in multivariate spatial problems, coverage is not as important as mixture diversity
Paramter estimation with large-scale spatio-temporal problems is hard but usually possible (INLA, VB, MCMC), but with large-scale multi-variate spatio-temporal under-determined problems considerably harder. Parallel sampling offers a possible solution.
Parallel sampling can also become intractable, what about parallel approximate Bayesian sparse methods? http://arxiv.org/abs/1305.4152
Jonathan Rougier (University of Bristol)
Botond Cseke (University of Edinburgh)
Finn Lindgren (University of Bath)